34 research outputs found
Qserv: a distributed shared-nothing database for the LSST catalog
The LSST project will provide public access to a database catalog that, in its final year, is estimated to include 26 billion stars and galaxies in dozens of trillion detections in multiple petabytes. Because we are not aware of an existing open-source database implementation that has been demonstrated to efficiently satisfy astronomers' spatial self-joining and cross-matching queries at this scale, we have implemented Qserv, a distributed shared-nothing SQL database query system. To speed development, Qserv relies on two successful open-source software packages: the MySQL RDBMS and the Xrootd distributed file system. We describe Qserv's design, architecture, and ability to scale to LSST's data requirements. We illustrate its potential with test results on a 150-node cluster using 55 billion rows and 30 terabytes of simulated data. These results demonstrate the soundness of Qserv's approach and the scale it achieves on today's hardware
REPORT FROM THE 2nd WORKSHOP ON EXTREMELY LARGE DATABASES
在科学界和业界,大规模分析的复杂性已经在近些年有了很大的提升。分析人员正在努力尝试使用复杂的技术,比如时间序列分析和分类算法,因为他们平时所熟悉的工具,虽然功能强大,但是可扩展性较差,无法有效使用可扩展的数据库系统。第2届XLDB大会,主要目的在于了解这些存在的问题,剖析这些问题的背后原因,并寻找相应的解决方案。大会还讨论了建设一个新的开源科学数据库SciDB,这个构想是在第1届XLDB大会(XLDB2007)上提出来的。本文是本次大会活动和讨论的总结报告
On the Verge of One Petabyte - the Story Behind the BaBar Database System
The BaBar database has pioneered the use of a commercial ODBMS within the HEP
community. The unique object-oriented architecture of Objectivity/DB has made
it possible to manage over 700 terabytes of production data generated since
May'99, making the BaBar database the world's largest known database. The
ongoing development includes new features, addressing the ever-increasing
luminosity of the detector as well as other changing physics requirements.
Significant efforts are focused on reducing space requirements and operational
costs. The paper discusses our experience with developing a large scale
database system, emphasizing universal aspects which may be applied to any
large scale system, independently of underlying technology used.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 6 pages. PSN MOKT01
Recommended from our members
Real-time Data Access Monitoring in Distributed, Multi-petabyte Systems
Petascale systems are in existence today and will become common in the next few years. Such systems are inevitably very complex, highly distributed and heterogeneous. Monitoring a petascale system in real-time and understanding its status at any given moment without impacting its performance is a highly intricate task. Common approaches and off-the-shelf tools are either unusable, do not scale, or severely impact the performance of the monitored servers. This paper describes unobtrusive monitoring software developed at Stanford Linear Accelerator Center (SLAC) for a highly distributed petascale production data set. The paper describes the employed solutions, the lessons learned, the problems still to be addressed, and explains how the system can be reused elsewhere
Agile software development in an earned value world: a survival guide
Agile methodologies are current best practice in software development. They are favored for, among other reasons, preventing premature optimization by taking a somewhat short-term focus, and allowing frequent replans/reprioritizations of upcoming development work based on recent results and current backlog. At the same time, funding agencies prescribe earned value management accounting for large projects which, these days, inevitably include substantial software components. Earned Value approaches emphasize a more comprehensive and typically longer-range plan, and tend to characterize frequent replans and reprioritizations as indicative of problems. Here we describe the planning, execution and reporting framework used by the LSST Data Management team, that navigates these opposite tensions
Designing a Multi-Petabyte Database for LSST
The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are being evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges
Agile software development in an earned value world: a survival guide
Agile methodologies are current best practice in software development. They are favored for, among other reasons, preventing premature optimization by taking a somewhat short-term focus, and allowing frequent replans/reprioritizations of upcoming development work based on recent results and current backlog. At the same time, funding agencies prescribe earned value management accounting for large projects which, these days, inevitably include substantial software components. Earned Value approaches emphasize a more comprehensive and typically longer-range plan, and tend to characterize frequent replans and reprioritizations as indicative of problems. Here we describe the planning, execution and reporting framework used by the LSST Data Management team, that navigates these opposite tensions
LSST Science Book, Version 2.0
A survey that can cover the sky in optical bands over wide fields to faint
magnitudes with a fast cadence will enable many of the exciting science
opportunities of the next decade. The Large Synoptic Survey Telescope (LSST)
will have an effective aperture of 6.7 meters and an imaging camera with field
of view of 9.6 deg^2, and will be devoted to a ten-year imaging survey over
20,000 deg^2 south of +15 deg. Each pointing will be imaged 2000 times with
fifteen second exposures in six broad bands from 0.35 to 1.1 microns, to a
total point-source depth of r~27.5. The LSST Science Book describes the basic
parameters of the LSST hardware, software, and observing plans. The book
discusses educational and outreach opportunities, then goes on to describe a
broad range of science that LSST will revolutionize: mapping the inner and
outer Solar System, stellar populations in the Milky Way and nearby galaxies,
the structure of the Milky Way disk and halo and other objects in the Local
Volume, transient and variable objects both at low and high redshift, and the
properties of normal and active galaxies at low and high redshift. It then
turns to far-field cosmological topics, exploring properties of supernovae to
z~1, strong and weak lensing, the large-scale distribution of galaxies and
baryon oscillations, and how these different probes may be combined to
constrain cosmological models and the physics of dark energy.Comment: 596 pages. Also available at full resolution at
http://www.lsst.org/lsst/sciboo